The Alignment Problem: Machine Learning and Human Values

The Alignment Problem: Machine Learning and Human Values

  • Downloads:7306
  • Type:Epub+TxT+PDF+Mobi
  • Create Date:2021-10-23 09:19:10
  • Update Date:2025-09-07
  • Status:finish
  • Author:Brian Christian
  • ISBN:0393868338
  • Environment:PC/Android/iPhone/iPad/Kindle

Reviews

Tim

My main insight from this book is that alignment is a far more concrete problem than I imagined。 I assumed that alignment simply meant making sure bad people don't have control of the goals of AI, and that we're thoughtful about crafting those goals。 Instead, AI safety research is about engineering systems that can understand and carefully act in environments with ambiguous or contradictory goals。That was a very abstract sentence, so let me try to explain that again。 A big problem for AI safety My main insight from this book is that alignment is a far more concrete problem than I imagined。 I assumed that alignment simply meant making sure bad people don't have control of the goals of AI, and that we're thoughtful about crafting those goals。 Instead, AI safety research is about engineering systems that can understand and carefully act in environments with ambiguous or contradictory goals。That was a very abstract sentence, so let me try to explain that again。 A big problem for AI safety is that human intentions are implicit。 The famous paperclip problem - what if an AI is told to make paperclips and it stops at nothing to turn the whole world into paperclips? - is posed because the AI doesn't understand that humans care about things other than paperclips。 But instead of training a model to make paperclips, you can train it to understand human intentions。 Or you can train it to be cautious about taking actions that change the world in any significant way。 There's real, concrete research ongoing in this field。 I'm excited to go learn more about the specific math and computer science behind these ideas。This also changed my perspective of AI safety from being a negative research field spurred by fear to a positive research field motivated by creation and hope。 AI safety research actually makes AI more useful rather than only telling us that we should really be careful what we build。I wonder if someone less familiar with machine learning (not that I'm an expert - but I've taken stats and intro to ML courses) would have trouble understanding this book。 For instance, as far as I recall the book doesn't describe specifically how neural networks work。 Maybe that's a strength of the book, though。 。。。more

Hassan Hayastani

GreatA solid knowing how and knowing that on AI and now, EI。

Jacob Williams

"We are in danger of losing control of the world not to AI or to machines as such but to models。" This is full of interesting historical anecdotes (like that time William James kept a bunch of chickens in his basement to help out a student) and good high-level explanations of various approaches to machine learning。Perhaps the most shocking issue discussed is how some US state justice systems used a model (called COMPAS) from a third-party provider for years to guide bail and sentencing decisions "We are in danger of losing control of the world not to AI or to machines as such but to models。" This is full of interesting historical anecdotes (like that time William James kept a bunch of chickens in his basement to help out a student) and good high-level explanations of various approaches to machine learning。Perhaps the most shocking issue discussed is how some US state justice systems used a model (called COMPAS) from a third-party provider for years to guide bail and sentencing decisions without doing any sort of validation of the efficacy or fairness of the model。 Christian also gives compelling examples of how dangerous it can be to naively trust a model you don't understand, like the case where a pneumonia-diagnosis model was accurately predicting that some patients were less likely to die of pneumonia: it turned out the reason they were less likely to die is that they had extra health conditions which caused hospital staff to view them as higher-risk and give them additional care。 So if the staff had started trusting the model's predictions instead, those patients would have likely been at even higher risk of dying than they were to begin with。 Trying to act on the model's advice would have undermined the model's accuracy!Still, although the description of this book on goodreads calls it a "jaw-dropping exploration of everything that goes wrong when we build AI systems", I found it to take a pretty measured attitude towards the problem, especially in parts 2 and 3。 The general impression it left me with is that there are very smart people working hard on making AI safe, and that they've got some good ideas。 The question, I guess, is whether society will listen when they urge caution, or if overeager deployment of stuff like COMPAS will be the norm。 。。。more

David V

This is a thoroughly researched and comprehensive overview of machine learning, its history, its features, and its flaws/risks。I guess I was expecting something a bit more like Algorithms to Live By by the same author and therefore found this book much drier (i。e。, less entertaining) than I would have liked。 I would have appreciated more examples and a deeper dive into the socio-economic implications of AI - e。g。, policing, education policy and admissions, job searching and hiring, etc。This may This is a thoroughly researched and comprehensive overview of machine learning, its history, its features, and its flaws/risks。I guess I was expecting something a bit more like Algorithms to Live By by the same author and therefore found this book much drier (i。e。, less entertaining) than I would have liked。 I would have appreciated more examples and a deeper dive into the socio-economic implications of AI - e。g。, policing, education policy and admissions, job searching and hiring, etc。This may be a case of "it's not you, it's me", but my 3-star rating is in recognition of the quality of the book, tempered by a lack of personal enjoyment。 。。。more

Sarah

Readable yet technical examination of AI safety problems and challenges of training machine learning models and a field that changed even as I read this book (with a new paper announcing a model that could produce “I don’t know” being published in nature a week ago)。 Well-cited and in-depth, approach this book with time and mental space to learn。

Tarmo Pungas

Machine learning and human learning are not so different。 This book shows how a machine can learn from humans, and how humanity can look at such machine and learn much about itself。 The book isn't very technical and is therefore easy to understand without a relevant background。 It convinced me that AI alignment is an important problem to work on, and it also managed to convey *why* it's important and difficult。 Machine learning and human learning are not so different。 This book shows how a machine can learn from humans, and how humanity can look at such machine and learn much about itself。 The book isn't very technical and is therefore easy to understand without a relevant background。 It convinced me that AI alignment is an important problem to work on, and it also managed to convey *why* it's important and difficult。 。。。more

Christoph Pröschel

Beyond the headlines of astonishing achievements in Machine Learning a much bigger question about its interaction with society and the "what" of good algorithmic design has been taking up speed。 This book manages to expertly introduce readers of any technical level not just to this higher-level discussion, but to the fundamentals of Machine learning in general。 Therefore it will be my new go-to recommendation for both technical and non-technical people, who are interested in the field。 Beyond the headlines of astonishing achievements in Machine Learning a much bigger question about its interaction with society and the "what" of good algorithmic design has been taking up speed。 This book manages to expertly introduce readers of any technical level not just to this higher-level discussion, but to the fundamentals of Machine learning in general。 Therefore it will be my new go-to recommendation for both technical and non-technical people, who are interested in the field。 。。。more

A

Not an easy read but every page had something new to teach。 If you haven't read Brian's previous book "Algorithms to live by", I'd recommend that reading 1st。 Not an easy read but every page had something new to teach。 If you haven't read Brian's previous book "Algorithms to live by", I'd recommend that reading 1st。 。。。more

Tommy

The Alignment Problem was phenomenal and I would highly recommend it to anyone who is even remotely interested in machine learning, how algorithms shape modern life, or even the parallels between psychology and artificial intelligence。 My main background in AI is from an extensive article on Wait But Why, which explained much more of the future cases of what artificial general intelligence would mean for our society。 The Alignment Problem, however, goes into the nuts and bolts of both the histor The Alignment Problem was phenomenal and I would highly recommend it to anyone who is even remotely interested in machine learning, how algorithms shape modern life, or even the parallels between psychology and artificial intelligence。 My main background in AI is from an extensive article on Wait But Why, which explained much more of the future cases of what artificial general intelligence would mean for our society。 The Alignment Problem, however, goes into the nuts and bolts of both the history and the current implementation—including successes as well as the multitude of pitfalls—of machine learning。 Ultimately, this book gave me hope in the future of machine learning, not because AI itself is so cool, but because there are so many people working to make it ethical, just, and amazing。 We find ourselves at a fragile moment in history—where the power and flexibility of these models have made them irresistibly useful for a large number of commercial and public applications, and yet our standards and norms around how to use them appropriately are still nascent。 (page 48) I read this voraciously and enjoyed it so much that I think I might buy it so that I can reread it。 I must also give the caveat that most of my reading of this book occurred in somewhat of a fugue state: sleep-deprived on a Greyhoud bus。 Nonetheless, I still believe The Alignment Problemto be enthralling。 I absolutely loved the way that Christian writes, equally erudite and strikingly approachable。 When there is a new topic that he wants the reader to learn about, he has a unique way of bringing it up that I found to be extremely effective。 First, he describes an everyday situation, then he gives a formal definition the subject/topic/term, and finally he explains how it is relevant or its application in the real world。 In essence, he invites the reader to build an intuition of a new topic, tells you that you kind of already know what this is—but he puts a new name to it—and then he shows you how it is quite a bit more amazing than you thought。 I think more people ought to teach in this way; to me, this is near the Platonic ideal of how to teach。 Furthermore, it was quite clear that Christian did his research for The Alignment Problem。 When he says that he did hundreds of interviews, I do not doubt him at all。 I must also address my earlier comment about how this book is extremely approachable in its prose。 Since a lot of this book was based not only on original research, but also relied heavily on personal interviews, Christian gave direct quotes of the way that people spoke (including their dialects/mannerisms of speaking) and also used syntactical tools such as ellipses to great effect。 I'll try not to gush too much more about this book, but I must also point out that I loved how much he integrated psychology into this book。 He could almost write an entire book just on how our brains work and I would love it equally。 Since this book was about machine learning and human values, Christian had to adequately address the latter portion of the subtitle, and boy did he deliver! I especially enjoyed the chapters on Imitation and Inference, where he described how we are trying to include human values in our AI either by—you guessed it—having the machines imitate us or infer what we are doing。 Lengthy sections of the book spoke exclusively on neuroscience (such as how dopamine is a "reward chemical" based not on the reward itself, but actually on how reality differs from our expectation of the future)。 Finally, I'll leave you with one of my favorite justifications about why you ought to learn more about this, from the conclusion, page 327 Increasingly, our interaction with almost any system involves a formal model of our own behavior。。。。 What we have seen in this book is the power of these models, the ways they go wrong, and the ways we are trying to align them with our interest。 。。。more

Simon Zou

This book covers a lot of historical and recent research into AI and AI safety and hits a really nice sweet spot in terms of pedagogically providing a high level overview, technical detail, and placing it in the broader context of AI as well as other disciplines。 More importantly, it covers the limitations, assumptions, and corrective actions the field is trying to make。 I had known about some of the ways algorithms could be biased against minorities but it never clicked for me until reading thi This book covers a lot of historical and recent research into AI and AI safety and hits a really nice sweet spot in terms of pedagogically providing a high level overview, technical detail, and placing it in the broader context of AI as well as other disciplines。 More importantly, it covers the limitations, assumptions, and corrective actions the field is trying to make。 I had known about some of the ways algorithms could be biased against minorities but it never clicked for me until reading this that that is a built-in aspect of the algorithms themselves。 Learning algorithms perform worse on less data and there is literally less data on minorities in any proportionally representative data set。 I learned a lot about the progress that's been made and the work that's yet to be done。 It also provides perhaps the most up to date explanation of how dopamine functions。 Interesting that it came up in a computer science book as opposed to other science or pop psychology books I've read but the key insight that it's triggered by an improvement in our expectations, or temporal difference learning, came from studying machine intelligence after more simple explanations like pleasure and novelty were shown insufficient for explaining certain experimental results。 This also provides a funny summary of the human condition - surprise is what gives us motivation and is also the mechanism by which we are constantly updating our expectations so that we can be surprised less often。 This book uses great quotes for the chapter headings and has great quotes from esteemed experts in the field。 I was continuously struck by the cleverness of researchers to find ways around problems and then struck back by the problems those workarounds unveiled。 The idea of formalizing a notion of curiosity and having that improve game playing AI was particularly cool。This book discusses aligning machines with human values。 It touches briefly on the reverse but I could not help feeling that we're almost getting to the point, if we're not already there, of having to give some moral consideration to the AI we're creating, as most advanced ones have shown themselves capable of behavior that emulates boredom and addiction。 An ominous quote used in the book is that so far, humans have been limited in the damage they can cause by the technology we have。 Understanding wisely how to use new technology is by its nature going to be behind discovering new technology。 Christian is a self-proclaimed optimist and is ultimately optimistic。 I'm a little less sure。 The book describes the growth and mainstream-ing of the AI Safety field but I couldn't help be struck by the lack of diversity in the experts interviewed。 Not just in terms of gender or race but in terms of institutions these experts came from。 It's where you might expect, places like Stanford, Berkeley, Harvard, MIT, Oxford, Cambridge, Google DeepMind, and OpenAI。 It was just striking to me that even having a recent graduate education from a top 20 university in computer science and surveying some of the websites of other top CS departments just outside this super elite class that my formal schooling covered basically none of this and wasn't even really an option。To summarize: super interesting and important topic, well written and educational, and I got dopamine from the improvement over my expectations for this book (which were already pretty high from listening about it on podcasts)。 5/5。 。。。more

Casey Dorman

I must admit that I was taken by surprise by the contents of Brian Christian’s recent book, The Alignment Problem。 The book came out in 2020 and made quite a splash among the artificial intelligence (AI) and machine intelligence community。 Much of the public, including myself, had been made aware of “the alignment problem” by Nick Bostrom’s book, Superintelligence, or the writings of people such as MIT physicist, Max Tegmark。 In fact, in my case, it was the conundrum of the alignment problem tha I must admit that I was taken by surprise by the contents of Brian Christian’s recent book, The Alignment Problem。 The book came out in 2020 and made quite a splash among the artificial intelligence (AI) and machine intelligence community。 Much of the public, including myself, had been made aware of “the alignment problem” by Nick Bostrom’s book, Superintelligence, or the writings of people such as MIT physicist, Max Tegmark。 In fact, in my case, it was the conundrum of the alignment problem that spurred me to write my science fiction novel, Ezekiel’s Brain。 Simply put, the alignment problem in the AI world is the question of how you create a superintelligent AI that is “friendly,” i。e。, helpful rather than dangerous, to humanity。 It’s such a difficult question that, in my novel, Ezekiel’s Brain, the creators of the superintelligent AI fail, and the result is disastrous for the human race。 What I was expecting from Brian Christian’s book was another description of the nightmare scenarios of the kind I wrote about in my novel, and experts such as Bostrom and Tegmark talk about in their writings。 That wasn’t what The Alignment Problem was about… or at least not what it was mostly about。Christian gives some detailed accounts of disastrous results applying the most sophisticated AI learning algorithms to actual human situations。 Some of these are well-known, such as attempts to censor social media content, or to produce an algorithm that aided judges in criminal sentencing, or to develop screening tools for employment selection。 Training AIs using data on human decisions simply amplified the biases, including gender, racial and ideological, we humans use to make our decisions。 These were instances of AIs performing in a way that was more harmful than helpful to humans, and they were results of which I had previously only been vaguely aware。 Although they were not the kind of misalignment that I was concerned with and had prompted me to buy the book, they expanded my concept of alignment considerably。Instead of providing nightmare scenarios of the danger of superintelligent AIs that are not aligned with what is best for humanity, the bulk of Christian’s book provides an exquisite history, up to the present, of the efforts of the AI community to define how machines can learn, what they are learning and what they ought to be learning, and how to identify whether the progress being made is bringing AIs in closer alignment with what humans want from them。 What was most surprising and gratifying to me, as a psychologist, was how much this effort is entwined with progress in understanding how people learn and what affects that learning process。 Christian writes his book like a good mystery, but rather than following a narrow plot, the breadth of inquiry is extraordinary。 Even a psychologist, such as myself, learned about findings in psychology and learning and child development of which I was unaware。 How computer scientists who develop AI use psychological findings to open up new avenues in machine learning is fascinating to hear about。 The collaborations are thrilling, and both psychologists and AI researchers who are not aware of how much is happening on this front should read Christian’s book to get an idea of how exciting an important this area of research is becoming。 Although I have some background related to psychology, AI, and the alignment problem, this book is written for the non-expert, and the interested layperson can easily understand it and bring their knowledge of the subject up to date。 I found it one of the most captivating and informative books I have read in the last several years, and I recommend it for everyone for whom this topic sparks an interest。 。。。more

Glenn

Occasionally, I stumble onto a book whose stories and concepts generate so many new ideas -- that they keep me up at night。 The Alignment Problem is one of those books。 Not entirely unexpectedly, the subject of the title (Alignment) is older than the study of AI or even the study of computers。 It is about the problem of getting results to align with plans and intentions; the problem of avoiding Murphy's Law, of somehow dodging the inevitable pressure of entropy as we try to navigate our way into Occasionally, I stumble onto a book whose stories and concepts generate so many new ideas -- that they keep me up at night。 The Alignment Problem is one of those books。 Not entirely unexpectedly, the subject of the title (Alignment) is older than the study of AI or even the study of computers。 It is about the problem of getting results to align with plans and intentions; the problem of avoiding Murphy's Law, of somehow dodging the inevitable pressure of entropy as we try to navigate our way into our "ideal technological tomorrow"。 Project managers know that only 40% of projects are considered successful。 Given any complex task, we humans frequently screw things up and find ourselves shoveling our way out of an unpleasant pile of unintended consequences。Many of us are proud that we have somehow lived through the nuclear age for eighty years without destroying ourselves。 We've had the luxury of time to conduct diplomacy, construct treaties and develop social consensus。 But, the evolution of AI may not give us the time to adjust。 To prevent potential (species-ending) consequences, it may be necessary to communicate something resembling human morality to neural networks -- and get it right the first time。 Regard for the continued existence of our species (even for the life of one human) is not something that comes naturally to an AI。 Now consider the more complex issues of "personal freedom", "privacy", "racial bias", "environmental collapse", etc。 Most humans understand that a Climate Collapse could be "game over" for our species -- or that our the engine of our economy -- our collective wealth -- is nothing but numbers in databases。 Communicating the importance of these ideas to AIs may be challenging, and the failure to communicate them could have irrecoverable consequences。Examples of these difficulties abound in Brian Christian's book。 Take for example the AI that realizes, faced with the problem of winning a boat race, that it can score just as highly by spinning around in circles, running into things and setting the boat on fire。 Or the Google algorithm that when told to search for "gorillas" displayed the faces of African Americans (including one of the key researchers of the algorithm。) Or the example of the Amazon AI that, when evaluating prospective candidates for jobs, selected those that most mirrored the ones already hired -- mostly white males。 It didn't matter that race and gender were omitted from the data。 The AIs taught included names, schools, even the way the resume was worded to identify the best fits --all of which inadvertently revealed gender and race。 Amazon discovered the problem and abandoned the software。 Not so the legal system which, for almost a decade, used a parole recommendations system that was biased against black males -- regardless of actual criminal record。Actually - neither the AIs nor the researchers are generally biased。 However, they draw from "real data" and our real data is skewed by the bias of the humans from which it came。 Biases can show up in unexpected places, for example, in photographs where (for decades) companies used Caucasian models to calibrate their film。 This radically inflated the error rate when facial recognition systems had to look at an African American -- especially African American women。One of the author's trains of thought (and perhaps a good one to close on) is the principle that “you shall know a word by the company it keeps”。 AI has proven to be very effective in analyzing our texts, associating words based on their proximity。 This allows them to track our evolving language and its embedded biases。 Brian Christian muses on a possible dashboard representing how language and human (gender, racial and other) biases change over time。 This seems to be something we can do now, or at least soon -- review a big picture of how we think and talk about others who are different from us。 Perhaps having a non-human intelligence periodically sending our species a report card is a good thing。 We might actually find things about ourselves that we want to change。 Imagine that -- an AI as a Jiminy Cricket, whispering the truth in our ear so that we might become better people :) 。。。more

Vijai

On March 23rd 2016, Microsoft threw their twitter chatbot at the world。 It started all innocent, what with the chatbot claiming that humans are cool。 Not 24 hours later, On March 24th 2016, the very chatbot proclaimed that Hitler did nothing wrong and wishing that feminists burned in hell。 Here's the link to that saga。 So, why did this ambitious machine learning project go tits up? Because the chatbot was modeling itself on the shit that was already in public domain like it was designed to。 Like On March 23rd 2016, Microsoft threw their twitter chatbot at the world。 It started all innocent, what with the chatbot claiming that humans are cool。 Not 24 hours later, On March 24th 2016, the very chatbot proclaimed that Hitler did nothing wrong and wishing that feminists burned in hell。 Here's the link to that saga。 So, why did this ambitious machine learning project go tits up? Because the chatbot was modeling itself on the shit that was already in public domain like it was designed to。 Like parents, like children。 Garbage in, garbage out。 You get my drift? Machine learning and AI just don't become sentient from word-go。 They are modelled on what real people have already done。 A wonderful concept assuming that existing data that the bots are sifting through are all from well meaning, ideal humans。 Unfortunately, us humans aren't。 Which is the crux of this author's main thrust in this book。 What if the brilliant ML programmer just happened to be flaming misogynist? Is there a chance that, a teeny tiny one, that his ML could have a gender bias? I am not even conjecturing, it has already happened as noted by the author in this book。 A wonderful book that only vindicated the fears I have had about this recent mad rush behind AI and ML。 Its like communism, the idea is all cool on paper and may well suit an utopian and ideal world。 However, does that serve a practical purpose in real life? In this nobody's opinion, no。 And my bias tells me that the author is saying so too。 An excellent read。 I enjoyed this author's first book on algorithms and I notice that there is a certain maturity in his material presentation in this book。 It is wonderful to see an author grow and become better。 Worth a read and worth the 5 stars。 Please buy first hand。 。。。more

Ellison

I learned a lot from this book about the history of AI development over the last 70 years, but more about how the author and the scientists he interviewed felt about AI and the alignment problem than I did the actual problem itself。 They all seemed a little naive in thinking the problem is a programming error when it is an human error really。 We are not aligned or alignable but ‘crooked timber’ through and through。 Computers might help us follow or reveal the grain of that timber but the danger I learned a lot from this book about the history of AI development over the last 70 years, but more about how the author and the scientists he interviewed felt about AI and the alignment problem than I did the actual problem itself。 They all seemed a little naive in thinking the problem is a programming error when it is an human error really。 We are not aligned or alignable but ‘crooked timber’ through and through。 Computers might help us follow or reveal the grain of that timber but the danger seems to be that they might also shape us in ways we don’t recognize or control。 ‘One way to do this, the Berkely group realized, is to have the system be at some level aware of how difficult it is to design an explicit reward function - to realize that the human users or programmers made their best-faith effort to craft a reward function that captured everything they wanted, but they likely did an imperfect job。 In this case even the score is not the score。 There is something the humans want, which the explicit objective merely and imperfectly relflects。’I love that phrase 'even the score is not the score。' I think I will put it on a T shirt。 。。。more

Colin

This wonder of a book is perhaps one of the most comprehensive and even-handed treatises about AI ethics I've ever read。 With poetic aplomb, Christian encapsulates the issues surrounding AI ethics in the context of a wider problem - which he aptly names the alignment problem - how to get AI to do what we truly want them to do, rather that what they are merely instructed to do - for it is in the act of communicating input where the ethical quandaries of AI kick in。Christian starts by articulating This wonder of a book is perhaps one of the most comprehensive and even-handed treatises about AI ethics I've ever read。 With poetic aplomb, Christian encapsulates the issues surrounding AI ethics in the context of a wider problem - which he aptly names the alignment problem - how to get AI to do what we truly want them to do, rather that what they are merely instructed to do - for it is in the act of communicating input where the ethical quandaries of AI kick in。Christian starts by articulating the parameters of the AI problem with characteristic lucidity - upending many myths or misconceptions in the public conception of AI。 For example, he disabuses any notions that AI bias is a result of the algorithms being malicious or maliciously created - most of the time, it is the training data that is problematic - either in its selection, or as a reflection of reality。 And when we want to correct for reality bias, removing the features of data that lead to bias - like gender - doesn't solve the problem, but could perhaps make it worse - because the algorithm will just latch onto a correlated variable and the lack of a bias is just going to make diagnosing the problem harder。 Another illuminating fact Christian raises is that the oft-repeated aspiration for AI to be fair is less simple than it seems - for there are many definitions of fairness (minimizing false positives or false negatives), and it is mathematically impossible to optimize for all of them in most cases。 Thus, any algorithm opens itself to criticisms based on what it is optimizing for, from different quarters, and it is crucial that designers take this into account。 The nice thing about this book, through, is that despite the many pitfalls and complexities of AI, Christian doesn't stick to the doom-and-gloom cautionary tale format that has already been amply demonstrated in other AI non-fiction。 Instead, he devotes much of the later half of the book featuring the research and achievements of specialists in the field of AI ethics and behavior, which is a slow march towards closing the gap between the input and the intention - or in Christian's words, to minimize the risk of confusing the map with the territory。 The research is still early, of course, but it is surprisingly forward thinking。 The ultimate goal to close this gap has been a sort of dialectic journey - first, in the development of reinforcement learning (RL) approaches, where we tell the AI what we want in terms of a reward function, then, inverse RL, where we demonstrate to the AI what we want (and having them infer or copy the intended behavior), and most recently, to a synthesis - how to articulate what we know we want and communicate its essence to the AI。 That latter involves not just programmatic commands but somehow giving the AI a sense of agency, through simulating interest, boredom, or even the agency to differentiate what we think we want from what is in our best interest。 If a recommender engine were truly ethical, would it continue serving a gambling addict advertisements about online gambling, just because it was reinforced by the gambler's behaviour? And a savvy AI should have some common sense to reject bad inputs, instead of blindly following the script。 Would a computer vision algorithm, given the task to classify a static-filled image one or two animals, choose an option with absurdly high confidence just because it was not programmed to reject either option? Perhaps a higher standard of ethics is what we should be aiming for, and that necessitates giving AI introspectiveness, and a healthy skepticism about the confidence of its own predictions。 Although (and this is territory that Christian doesn't wade into) I wonder what this means - would an AI that is truly moral need to be so advanced, self-aware, and deserving of rights, that to exploit it for our convenience would be immoral from our point of view?In any case, this is an excellent book for any student of AI。 I give this: 5 out of 5 games of Montezuma's Revenge 。。。more

Jay Heinrichs

One of the best nonfiction books I've read this year: a thoughtful, highly readable, and even witty guide to artificial intelligence, machine learning, and the human mind。 Any parent should be familiar with the alignment problem, if not with the term itself。 When you set a rule for a kid, a smart child will immediately find loopholes。 Why? Because your purposes don't directly align with that of the kid。Author Brian Christian, a visiting scholar at UC Berkeley, covers the history of artificial in One of the best nonfiction books I've read this year: a thoughtful, highly readable, and even witty guide to artificial intelligence, machine learning, and the human mind。 Any parent should be familiar with the alignment problem, if not with the term itself。 When you set a rule for a kid, a smart child will immediately find loopholes。 Why? Because your purposes don't directly align with that of the kid。Author Brian Christian, a visiting scholar at UC Berkeley, covers the history of artificial intelligence from the nineteenth century up through Turing and the present。 He gives us a scary, and ultimately hopeful, picture of the future, one in which machines will do our thinking--with or without human collaboration。 As someone writing a proposal for a book on self persuasion, I found this book surprisingly helpful。 Neurologists and cognitive scientists, along with linguists and other thinkers navigating the frontier between science and the humanities, are informing the developers of thinking machines。 In turn, artificial neural networks and their creators are offering new insights into the ways our human neural networks operate。 I first heard about this book from Ezra Klein's podcast, in case you want to give it a listen beforehand。 Great summer reading for the nerdy and wonkish type。 。。。more

Abi Olvera

This book is an excellent, well-researched, and well-presented overview of the current problems in making AI work better and safer。 He strikes an excellent balance in giving easy to understand examples to delve into specific subsets of AI safety (like comparing value alignment to a child who who is accidentally encouraged to make messes for the praise of cleaning of them with their toy broom)。 His style makes questions of curiosity in AI models, inner alignment problems, and issues with transpar This book is an excellent, well-researched, and well-presented overview of the current problems in making AI work better and safer。 He strikes an excellent balance in giving easy to understand examples to delve into specific subsets of AI safety (like comparing value alignment to a child who who is accidentally encouraged to make messes for the praise of cleaning of them with their toy broom)。 His style makes questions of curiosity in AI models, inner alignment problems, and issues with transparency easy and approachable。 To illustrate issues in transparency, the author examined an AI system which recommended health care options based on risk。 The system considered that asthmatic patients with pneumonia as low risk since the AI saw data showing that asthma patients had better survival rates than the average population。 However, this was precisely because of the high level of care given to them because of their high risk; doctors typically send them straight to the ICU or other emergency care。 AI systems can beat human intelligence in very specific scenarios (chess, Go, etc。) though it's hard to tell what AI might do in cases like this。 Glad this book exists to shift conversations toward more AI safety before the models go wrong while we're depending on them。 。。。more

Amanda

This book has everything: math, social justice, computer science, philosophy, ethics, psychology, behavioral economics。Enjoyably told but not fluffy。 Thumbs up。

Cory

Highly recommended。 Wonderful primer on ML, AI, and the associated challenges。 Some fun anecdotes in there if you're a parent and are interested in how brains (and AIs) develop! Highly recommended。 Wonderful primer on ML, AI, and the associated challenges。 Some fun anecdotes in there if you're a parent and are interested in how brains (and AIs) develop! 。。。more

Jade

This is a fantastic book for people who are interested in the ideas of machine learning in the past, present and near future。 I have tried to read a few books on AI before but found them to be either too technical, too basic or too boring。 Not this book。The Alignment Problem is an engrossing and thought-provoking exploration of the development of machine learning and human learning intertwined。 Through it all, the book stresses on the "Alignment Problem" - How can we develop AI to do what we wan This is a fantastic book for people who are interested in the ideas of machine learning in the past, present and near future。 I have tried to read a few books on AI before but found them to be either too technical, too basic or too boring。 Not this book。The Alignment Problem is an engrossing and thought-provoking exploration of the development of machine learning and human learning intertwined。 Through it all, the book stresses on the "Alignment Problem" - How can we develop AI to do what we want it to do without doing what we don't want it to do。 Many types of AI are explored, all of them have their own advantages and drawbacks。 Several times throughout the book, I had to pause to share a particularly insightful though or idea with someone lest I forget what I just read。 I encourage anyone who is interested in learning to read it now! 。。。more

Gabriel Gilling

I was expecting this to be another book around the ethics of A。I。 that simply reiterates some of the sensational headlines surrounding the topic these past years, but I ended up learning a ton about different types of A。I。 algorithms, their underlying assumptions and relevancy for today's world。 The author makes a great job at boiling down incredibly complicated research into simple concepts, also relating why they matter。Recommended both for A。I。/Data science practitioners and others。 I was expecting this to be another book around the ethics of A。I。 that simply reiterates some of the sensational headlines surrounding the topic these past years, but I ended up learning a ton about different types of A。I。 algorithms, their underlying assumptions and relevancy for today's world。 The author makes a great job at boiling down incredibly complicated research into simple concepts, also relating why they matter。Recommended both for A。I。/Data science practitioners and others。 。。。more

Fernando Rodriguez-Villa

First several chapters should be required reading for anyone working for / investing in "AI" companies and frankly, everyone using AI products (which is now everyone)!The middle of the book gets into recent history of AI techniques around safety, which are probably most interesting for folks with a lot of interest going in。Overall, not as "fun" as Algorithms to Live By but still a great read by an engaging, thorough author。 First several chapters should be required reading for anyone working for / investing in "AI" companies and frankly, everyone using AI products (which is now everyone)!The middle of the book gets into recent history of AI techniques around safety, which are probably most interesting for folks with a lot of interest going in。Overall, not as "fun" as Algorithms to Live By but still a great read by an engaging, thorough author。 。。。more

Dan Elton

A well researched book on AI safety written to be enjoyed by experts and newbies alike!This book is the culmination of *four years* of dedicated work and interviews with over 100 world-class experts。 The brilliant thing about this book is that it is so information dense and full of interesting anecdotes that people of any level of expertise stand to gain something from it。 He’s carefully tuned it so a wide variety of people can enjoy it without getting bored or overwhelmed。 This book covers the A well researched book on AI safety written to be enjoyed by experts and newbies alike!This book is the culmination of *four years* of dedicated work and interviews with over 100 world-class experts。 The brilliant thing about this book is that it is so information dense and full of interesting anecdotes that people of any level of expertise stand to gain something from it。 He’s carefully tuned it so a wide variety of people can enjoy it without getting bored or overwhelmed。 This book covers the well known problems of bias and brittleness in machine learning, including the following well-known cases - the Richard Caruana’s example of pneumonia triage system that went haywire, the COMPAS parole recommendation system, the Google Photos “gorilla” tag fiasco, word2vector gender bias, and the 2018 fatal Uber car crash in Tempe, Arizona。 You’d be mistaken to think of this as just another book warning about data bias, lack of robustness, and the potential for discrimination and the perpetuation of inequalities, however。 Sprinkled between the warnings and calls for action are remarkably clear descriptions of modern machine learning techniques and how they relate and/or were inspired by recent developments in neuroscience, cognitive science, developmental psychology, and the social sciences。 The author dives into the nitty gritty of how present day AI systems work and does not shy away from explaining current technical challenges。 The way he explains reinforcement learning and links it to research on the dopamine in the brain was one of the highlights of the book for me (I had forgotten how dopamine was linked to temporal difference error, and his description of the history of study on dopamine was fascinating)。 Not all of the concepts were new to me, but in every case the way he explained each concept was very new to me and wonderful to read。 I learned new concepts too。 For instance, I never understood what the difference between “on policy” and “off policy” RL systems was until I read his explanation。 Other concepts I picked up were “cooperative reinforcement learning”, “shaping”, and various “impact metrics”。 If you haven’t heard of these terms and are interested in AI safety, I heartily recommend this book。 This book follows a trend of seamlessly linking near term and far term AI safety concerns which has been a trend since the publication of Nick Bostrom’s 2014 meditation on far future AI, “Superintelligence”。 The book is very “down to earth” -- you may be surprised that the standard arguments about why we should be concerned about long term AI risk that we’ve heard from Elon Musk, Sam Harris, etc are largely absent from this book (most notoriously, the “paperclip maximizer”)。 This is refreshing because those arguments draw on assumptions (such as fast takeoff) which are very hard to defend with empirical data or the current science on AI。 (I still find those arguments convincing enough to warrant serious investment of resources to prevent risk, but they aren’t necessarily the best first arguments to present to someone) Instead the author follows an ingenious strategy - he starts with current problems in AI and some near future concerns (for instance with driverless cars driving off the road or home robots that refuse to be turned off。) Then, by providing sufficient technical background, he proceeds to explain why these are really hard problems, some of the solutions that are being worked on, and the limitations of the solutions proposed so far。 The book is cautiously optimistic, showing how meaningful progress on the alignment problem is already occurring。 So far the problems with AI that we are encountering *right now* appear tractable, which should motivate more people and resources to flow into AI Safety rather than trying to regulate progress to a standstill, which is impossible and likely to be harmful。 At the same time, however, by the end of the book the reader will have a deep appreciation of the challenges ahead and the need for extreme caution as we move towards more and more intelligent and powerful AI。 。。。more

Amaan Pirani

Incredible。 Christian does a great job describing the current state of AI ethics research, in layman's terms, while still remaining thorough。 Incredible。 Christian does a great job describing the current state of AI ethics research, in layman's terms, while still remaining thorough。 。。。more

Christopher W

This book will be very useful in an upcoming applied ML course that I'll be teaching soon。 This book will be very useful in an upcoming applied ML course that I'll be teaching soon。 。。。more

Andrew

A vivid, highly readable, highly engaging, erudite discussion of the promise of artificial intelligence and the challenge of ensuring that the behavior of AI is well-aligned with our desired goals。 The "alignment" problem boils down to the challenge of desiring AI to do A while rewarding it when it does B。 There are many different ways to read this book, even if you're not totally invested in the science or research around AI。 Read it to understand how biased models inform sentencing; how resear A vivid, highly readable, highly engaging, erudite discussion of the promise of artificial intelligence and the challenge of ensuring that the behavior of AI is well-aligned with our desired goals。 The "alignment" problem boils down to the challenge of desiring AI to do A while rewarding it when it does B。 There are many different ways to read this book, even if you're not totally invested in the science or research around AI。 Read it to understand how biased models inform sentencing; how researchers discovered dopamine's relationship to humans' ability to learn; how we as humans contain intrinsic motivation and are developing a whole research area around the utility of curiosity。The book concludes with reflections on the open questions AI and ethics research has yet to resolve in several areas, perhaps most powerfully that there remain important, ill-defined aspects to human behavior, decision-making, and relationship-building that cannot be readily translated to algorithmic models effectively。 Maybe they will eventually, but then will those models represent the people we are or were? Will a general artificial intelligence know the difference or care? 。。。more

Ceil

Really wonderful book for the layperson who wants to get beneath the surface of the history and implcations of AI, but will likely never be ready for an algorithmic deep dive into either。 You'll learn as much about humans - child development, motivation, reward systems, and inability to anticipate every possible outcome of a given series of instructions。 Oh, and you'll learn enormous amounts (again, for a layperson) about how various models, theories, and tools have been put to good and dead-end Really wonderful book for the layperson who wants to get beneath the surface of the history and implcations of AI, but will likely never be ready for an algorithmic deep dive into either。 You'll learn as much about humans - child development, motivation, reward systems, and inability to anticipate every possible outcome of a given series of instructions。 Oh, and you'll learn enormous amounts (again, for a layperson) about how various models, theories, and tools have been put to good and dead-end use in the service of creating ever more "intelligent" machines。 When you're about halfway through, you'll stop putting quotes around that。Read by the author, who is good。 。。。more

Maud van Lier

Even though Christian discusses a lot of important topics in current AI, the writing is long, at times boring, and full with unnecessary details。 The book feels like a collection of conversations - what it of course also is - and for me that doesn't do it。 However, for those who like books in interview style, and who are not yet familiair with AI, this might definitely be interesting。 Even though Christian discusses a lot of important topics in current AI, the writing is long, at times boring, and full with unnecessary details。 The book feels like a collection of conversations - what it of course also is - and for me that doesn't do it。 However, for those who like books in interview style, and who are not yet familiair with AI, this might definitely be interesting。 。。。more

Jessica Dai

tldr worth a read !Really solid overview of the research field that is typically referred to as "responsible AI" (fairness, explainability, deep learning, language models, RL) -- this book is therefore unique from other tech x society books in the sense that it is highly technical but also [I think] accessible, though I'm probably not the best person to judge that。 I'd consider myself pretty familiar with the academic work that this book describes, but Christian packages a really nice story for tldr worth a read !Really solid overview of the research field that is typically referred to as "responsible AI" (fairness, explainability, deep learning, language models, RL) -- this book is therefore unique from other tech x society books in the sense that it is highly technical but also [I think] accessible, though I'm probably not the best person to judge that。 I'd consider myself pretty familiar with the academic work that this book describes, but Christian packages a really nice story for the history of particular subfields/ lines of inquiry, and draws connections to e。g。 psych/neuro, and I feel like I learned a lot。My personal thought on e。g。 putting a values-aligned lens on RL agents has always been that I have trouble drawing a line from the academic work to what this means in practice (as opposed to e。g。 fairness or language models, where these are related to systems already in production and which are therefore already shaping/reshaping people's lives)。 I sort of wish this was made clearer! But also nitpicking lol。 Reboot review (not written by me) here。 。。。more

Suzanne

I think of AI and machine learning as something you'd need a bunch of programming knowledge to understand, and I'm sure you do on a technical level。 But Christian does an excellent job explaining the broad strokes of what the AI safety movement is about without much technical detail。 I was surprised how much of the book was about the intersections with philosophy, ethics, human learning & psychology, cognitive science, and behavioral economics。 Lots to keep pondering, and extra motivation to act I think of AI and machine learning as something you'd need a bunch of programming knowledge to understand, and I'm sure you do on a technical level。 But Christian does an excellent job explaining the broad strokes of what the AI safety movement is about without much technical detail。 I was surprised how much of the book was about the intersections with philosophy, ethics, human learning & psychology, cognitive science, and behavioral economics。 Lots to keep pondering, and extra motivation to actually learn machine learning now。 。。。more